% ------------------------------------------------------------------
\chapter{Wrappers and pre-trained models}\label{s:wrappers}
% ------------------------------------------------------------------

It is easy enough to combine the computational blocks of \autoref{s:blocks} ``manually''. However, it is usually much more convenient to use them through a \emph{wrapper} that can implement CNN architectures given a model specification. The available wrapper are briefly summarised in \autoref{s:wrappers-overview}.

\matconvnet also comes with many pre-trained models for image classification (most of which are trained on the ImageNet ILSVRC challenge), image segmentation, text spotting, and face recognition. These are very simple to use, as illustrated in \autoref{s:pretrained}.

% ------------------------------------------------------------------
\section{Wrappers}\label{s:wrappers-overview}
% ------------------------------------------------------------------

\matconvnet provides two wrappers: SimpleNN for basic chains of blocks (\autoref{s:simplenn}) , and DagNN for more complex graphs. simple wrapper for the common case of a linear chain (\autoref{s:dagnn}).  

% ------------------------------------------------------------------
\subsection{SimpleNN}\label{s:simplenn}
% ------------------------------------------------------------------

The SimpleNN wrapper is suitable for networks consisting of linear chains of computational blocks.  It is largely implemented by the \verb!vl_simplenn! function (evaluation of the CNN and of its derivatives), with a few other support functions such as \verb!vl_simplenn_move! (moving the CNN between CPU and GPU) and \verb!vl_simplenn_display! (obtain and/or print information about the CNN).

\verb!vl_simplenn! takes as input a structure \verb!net! representing the CNN as well as input \verb!x! and potentially output derivatives \verb!dzdy!, depending on the mode of operation. Please refer to the inline help of the \verb!vl_simplenn! function for details on the input and output formats. In fact, the implementation of \verb!vl_simplenn! is a good example of how the basic neural net building block can be used together and can serve as a basis for more complex implementations.

% ------------------------------------------------------------------
\subsection{DagNN}\label{s:dagnn}
% ------------------------------------------------------------------

The DagNN wrapper is more complex that SimpleNN as it has to support arbitrary graph topologies. Its design is object oriented, with one class implementing each layer type. While this adds complexity, and makes the wrapper slightly slower for tiny CNN architectures (e.g. MNIST), it is in practice much more flexible and easier to extend.

DagNN is implemented by the \verb!dagnn.DagNN! class (under the \verb!dagnn! namespace).

% ------------------------------------------------------------------
\section{Pre-trained models}\label{s:pretrained}
% ------------------------------------------------------------------

\verb!vl_simplenn! is easy to use with pre-trained models (see the homepage to download some). For example, the following code downloads a model pre-trained on the ImageNet data and applies it to one of MATLAB stock images:
\begin{lstlisting}[language=Matlab]
% setup MatConvNet in MATLAB
run matlab/vl_setupnn

% download a pre-trained CNN from the web
urlwrite(...
  'http://www.vlfeat.org/matconvnet/models/imagenet-vgg-f.mat', ...
  'imagenet-vgg-f.mat') ;
net = load('imagenet-vgg-f.mat') ;

% obtain and preprocess an image
im = imread('peppers.png') ;
im_ = single(im) ; % note: 255 range
im_ = imresize(im_, net.normalization.imageSize(1:2)) ;
im_ = im_ - net.normalization.averageImage ;
\end{lstlisting}
Note that the image should be preprocessed before running the network. While preprocessing specifics depend on the model, the pre-trained model contain a \verb!net.normalization! field that describes the type of preprocessing that is expected. Note in particular that this network takes images of a fixed size as input and requires removing the mean; also, image intensities are normalized in the range [0,255].

The next step is running the CNN. This will return a \verb!res! structure with the output of the network layers:
\begin{lstlisting}[language=Matlab]
% run the CNN
res = vl_simplenn(net, im_) ;
\end{lstlisting}

The output of the last layer can be used to classify the image. The class names are contained in the \verb!net! structure for convenience:
\begin{lstlisting}[language=Matlab]
% show the classification result
scores = squeeze(gather(res(end).x)) ;
[bestScore, best] = max(scores) ;
figure(1) ; clf ; imagesc(im) ;
title(sprintf('%s (%d), score %.3f',...
net.classes.description{best}, best, bestScore)) ;
\end{lstlisting}

Note that several extensions are possible. First, images can be cropped rather than rescaled. Second, multiple crops can be fed to the network and results averaged, usually for improved results. Third, the output of the network can be used as generic features for image encoding.

% ------------------------------------------------------------------
\section{Learning models}
% ------------------------------------------------------------------

As \matconvnet can compute derivatives of the CNN using back-propagation, it is simple to implement learning algorithms with it. A basic implementation of stochastic gradient descent is therefore straightforward. Example code is provided in \verb!examples/cnn_train!. This code is flexible enough to allow training on NMINST, CIFAR, ImageNet, and probably many other datasets. Corresponding examples are provided in the \verb!examples/! directory.

% ------------------------------------------------------------------
\section{Running large scale experiments}
% ------------------------------------------------------------------

For large scale experiments, such as learning a network for ImageNet, a NVIDIA GPU (at least 6GB of memory) and adequate CPU and disk speeds are highly recommended. For example, to train on ImageNet, we suggest the following:
\begin{itemize}
\item Download the ImageNet data~\url{http://www.image-net.org/challenges/LSVRC}. Install it somewhere and link to it from \verb!data/imagenet12!
\item Consider preprocessing the data to convert all images to have an height 256 pixels. This can be done with the supplied \verb!utils/preprocess-imagenet.sh! script. In this manner, training will not have to resize the images every time. Do not forget to point the training code to the pre-processed data.
\item Consider copying the dataset in to a RAM disk (provided that you have enough memory) for faster access. Do not forget to point the training code to this copy.
\item Compile \matconvnet with GPU support. See the homepage for instructions.
\end{itemize}

Once your setup is ready, you should be able to run \verb!examples/cnn_imagenet! (edit the file and change any flag as needed to enable GPU support and image pre-fetching on multiple threads).

If all goes well, you should expect to be able to train with 200-300 images/sec.